Stability of Stochastic Approximations with 'Controlled Markov' Noise and Temporal Difference Learning

نویسندگان

  • Arunselvan Ramaswamy
  • Shalabh Bhatnagar
چکیده

In this paper we present a ‘stability theorem’ for stochastic approximation (SA) algorithms with ‘controlled Markov’ noise. Such algorithms were first studied by Borkar in 2006. Specifically, sufficient conditions are presented which guarantee the stability of the iterates. Further, under these conditions the iterates are shown to track a solution to the differential inclusion defined in terms of the ergodic occupation measures associated with the ‘controlled Markov’ process. As an application to our main result we present an improvement to a general form of temporal difference learning algorithms. Specifically, we present sufficient conditions for their stability and convergence using our framework. This paper builds on the works of Borkar and Benveniste, Metivier and Priouret.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous dependence on coefficients for stochastic evolution equations with multiplicative Levy Noise and monotone nonlinearity

Semilinear stochastic evolution equations with multiplicative L'evy noise are considered‎. ‎The drift term is assumed to be monotone nonlinear and with linear growth‎. ‎Unlike other similar works‎, ‎we do not impose coercivity conditions on coefficients‎. ‎We establish the continuous dependence of the mild solution with respect to initial conditions and also on coefficients. ‎As corollaries of ...

متن کامل

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

We present for the first time an asymptotic convergence analysis of two-timescale stochastic approximation driven by controlled Markov noise. In particular, both the faster and slower recursions have non-additive Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both times...

متن کامل

Asymptotic and non-asymptotic convergence properties of stochastic approximation with controlled Markov noise without ensuring stability

This paper talks about both the asymptotic and non-asymptotic convergence properties of stochastic approximation algorithms with controlled Markov noise when stability of the iterates (which is an important condition for almost sure convergence) is hard to prove. We achieve the same by giving a lower bound on the lock-in probability of such frameworks i.e. the probability of convergence to a sp...

متن کامل

Approximation of stochastic advection diffusion equations with finite difference scheme

In this paper, a high-order and conditionally stable stochastic difference scheme is proposed for the numerical solution of $rm Ithat{o}$ stochastic advection diffusion equation with one dimensional white noise process. We applied a finite difference approximation of fourth-order for discretizing space spatial derivative of this equation. The main properties of deterministic difference schemes,...

متن کامل

Average cost temporal-difference learning

We propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present a proof of convergence (with probability 1), and a characterization o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1504.06043  شماره 

صفحات  -

تاریخ انتشار 2015